Welcome to deep learning. So in this little video we want to go ahead and look into some basic functions of neural networks and in particular we want to look into the softmax function and look into some ideas how we could potentially train the deep networks.
Okay, so let's start. Activation functions for classifications. Now so far we have
described the ground truth by labels minus one plus one but of course we
could also have classes zero one. So this is really only a thing of definition if
we only do a decision between two classes. But if you want to go into more
complex cases you want to be able to classify multiple classes. So in this
case you probably want to have an output vector and here you have essentially one
dimension per class k. So k, capital K here is the number of classes and you can
then define a ground truth representation as a vector that has all
zeros except for one position and that is the true class. So this is also called
one-hot encoding because all of the other parts of the vector are zero and
only a single one has a one. And now you try to compute a classifier that will
produce a respective vector and with this vector y hat you can then go ahead
and do the classification. So it's essentially like guessing a output
probability for each of the classes. In particular for multi-class problems this
has been shown to be a more efficient way of computing these problems. Now the
problem is you want to have a kind of probabilistic output towards zero and
one but we typically have some arbitrary input vector x and that could be
arbitrarily scaled. So in order to produce now our predictions we employ a
trick and the trick is that we use the exponential function. So this is very
nice because the exponential function will map everything into a positive
space and now you want to make sure that the maximum that can be achieved is
exactly one. So you do that for all of your classes. So you compute the sum over
all of the exponentials of all input vectors or of all input elements, use the
exponential function on them, sum them up and this gives you the maximum that can
be attained by this conversion and you divide by this number for all of your
given inputs and this will always scale to a zero one domain and it will have
the property that if you sum up all elements of the vector it will equal to
one. This is very nice because these are two axioms of the probability
distribution introduced by Kolmogorov. So this allows us to treat the output of
the network always as kind of probabilities and if you look in
literature or also in software examples sometimes the softmax function is also
known as the normalized exponential function. So it's the same thing. Now
let's look at an example. So let's say this is our input to our neural network
so you see this small image on the left. Now you introduce labels for this
three-class problem. Wait there's something missing.
It's a four-class problem. So you introduce labels for this four-class
problem and then you have some arbitrary input that is shown here in the column
xk. So they are scaled from minus 3.44 to 3.91. This is not so great so let's use
the exponential function. Now everything is mapped into positive numbers and
there's quite a difference now between the numbers. So we need to rescale them
and you can see the highest probability is of course returned for heavy metal in
this image. So let's go ahead and also talk a bit about loss functions. So the
loss function is a kind of function that tells you how good the prediction of a
network is and a very typical one is the so-called cross entropy loss and it's
the cross entropy that is computed between two probability distributions. So
you have your ground truth distribution and the one that you're estimating and
then you can compute the cross entropy in order to determine how well they are
connected, how well they align with each other and then you can also use this
into a loss function. Here we can use the property that all of our elements will be
Presenters
Zugänglich über
Offener Zugang
Dauer
00:11:22 Min
Aufnahmedatum
2020-05-28
Hochgeladen am
2020-05-28 19:46:31
Sprache
en-US
Deep Learning - Feedforward Networks Part 2
This video introduces the topics of activation functions, loss, and the idea of gradient descent.
Music Reference:
The One They Fear - The Dawn
Further Reading:
A gentle Introduction to Deep Learning